val lines = sc.textFile("README.md")
scala> lines.getStorageLevel
res0: org.apache.spark.storage.StorageLevel = StorageLevel(disk=false, memory=false, offheap=false, deserialized=false, replication=1)
StorageLevel
StorageLevel describes how an RDD is persisted (and addresses the following concerns):
-
Does RDD use disk?
-
How much of RDD is in memory?
-
Does RDD use off-heap memory?
-
Should an RDD be serialized (while persisting)?
-
How many replicas (default:
1) to use (can only be less than40)?
There are the following StorageLevel (number _2 in the name denotes 2 replicas):
-
NONE(default) -
DISK_ONLY -
DISK_ONLY_2 -
MEMORY_ONLY(default forcacheoperation for RDDs) -
MEMORY_ONLY_2 -
MEMORY_ONLY_SER -
MEMORY_ONLY_SER_2 -
MEMORY_AND_DISK -
MEMORY_AND_DISK_2 -
MEMORY_AND_DISK_SER -
MEMORY_AND_DISK_SER_2 -
OFF_HEAP
You can check out the storage level using getStorageLevel() operation.